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Job-to-person search system 


Recentjob titles * 

Job group 
Job class •» 

City -• 

Postal/ZIP code or city 

+25 miles ▼ 

US Arizona City A2 © 25 miles 
Nice to have — • Must have 

Employers 

Years of experience ~ 

IT skills ,• 

Language skills ~ 

Education level 

□ Master (3) 

(J Post-Master 

Secondarv Education 19) 


© E 


City US Arizona City AZ © 25 miles ▼ X 

Years of experience 3 to 5 years * X 

Job class 

Hospitality 

▼ X 

Recent job titles Customer service representative +23 ▼ X 

Job group Customer Service Personnel ▼ X 



0 R @ 


Projects 


□ (37) 


▼ 37 results 


/ Customer Service Representative / Customer Service Representative _ p || ▼ 


Jeffrey / Technical Support Coordinator, Customer Service Representative, CustomerService R _ a || » 


Christina / Operations Manager Environmental Services, Director of Food Service/Assistant Food 

▼ 


City: US Arizona City AZ ® 25 miles ✓ 
Job class: Hospitality ✓ 
Job group: Customer Service Personnel X 
Recent job titles: Customer service representative X 
Years of experience X 


(J Jerry / Order Picker, Customer Service, Warehouseman/! 

- 

□ Joe / Well Tester/Frac Suppport, Stinger Welding / Well Te 

▼ 


LI Sylvia 



/ Supervisor, Dispatcher / Supervisor 


- 










































Generated query 


All Positions [Technical Account Manager] » x Profession Account Manager (Technical Products) * x 

Job Category Sales and Trading * x Job Type Articles and Products Representives » x City US Portland OR © 75 miles * x 


H 0 


IT Skills SQL +8 * x & WSDL+13 * x & web service » x & Excel+10 » x & Troubleshooting ▼ x & CRM * x & 


ITIL +3 * x & OEM +57 * x 

Years of Experience 3 to 5 years x / 6 to 10 years 

X 

Full text client business • 

Account Management+7 * 

x & sales +14 * 

x & SoapUl +34 

x & 

* X 

pricing +5 * x & scheduling -10 * x & Customer Service +13 ♦ x & 

& web service calls * x 


Education 


Master's x / 


Doctoral or Phd x 


Languages name: English » x & name: Spanish » x 


Last Employer Bank of America *■ x 































Match indicator 


/ AirWatch by VMware / US Atlanta GA 



Education: Master's, Doctoral or Phd 

✓ 


IT Skills: SQL 

✓ 


IT Skills: Excel 

✓ 

/ Fiserv 

IT Skills: CRM 

✓ 


IT Skills: OEM 

✓ 


Job Category: Sales and Trading 

✓ 


Languages: name: English 

✓ 

:>r Associate 

Languages: name: Spanish 

✓ 


Years of Experience 

✓ 


sales 

✓ 


scheduling 

✓ 


Customer Service 

✓ 

iirtmentof A 

Account Management 

✓ 


SoapUl 

✓ 


All Positions: [Technical Account Manager] 

X 


City: US Portland OR © 75 miles 

X 

it Consultan 

IT Skills: WSDL 

X 


IT Skills: web service 

X 


IT Skills: Troubleshooting 

X 


IT Skills: ITIL 

X 


Job Type: Articles and Products Representives 

X 

Specialist / 

Last Employer: Bank of America 

X 


Profession Account Manager (Technical Products) 

X 


client business 

X 


pricing 

X 

nr / SunGar 

web service calls 

X 










Faceted search 


• Multiple explicit and independent dimensions, called facets 

• Lets users refine search by choosing values 

• No candidate is ideal: many should-have clauses 


Scoring of search results 


• Term-frequency based metric 
e.g. BM25, TF-IDF 


• Facet weights 


TF-1D F(j o bt i 11 e) ■ 

1.5 

+ 

TF-IDF(skill) ■ 

2.0 

+ 

TF-IDF(location) ■ 

0.7 

+ 

TF-IDF(languages) ■ 

0.25 

=score 


Tuning the system: objectives 


“If I search for a skill ‘Java’ I want the candidates that 
also have ‘Java’ in their Jobtitie field to be weighted 
higher” 

“Education will be a less important match, the more 
years of experience a candidate has” 

“We should weight location matches less when finding 
candidates in IT” 


Learning to rank .1 

• Learn a parameterized ranking model 

• That optimizes ranking order 

• Re-learn for personalization or preference change 


Learning to rank by tuning facet weights 


• Do exhaustive search for optimal weights to set 

• Improved our retrieval by 6% (NDCG metric) 


1.5 

2.0 

0.25 


[TF-IDF(jobtitle), TF-IDF(skill) ... TF-IDF(language) ] • 


= score 




Tuning facet weights: limitations 


• Cannot consider interdependency of facet field dimensions 

• Cannot take into account the actual content of fields 
o only match indicators 


Learning objectives 


Take into account facet field content 
Model facet field interdependencies 


Learning to rank 


• Machine Learning from user feedback 

• Input: set of {query, lists of assessed documents} 

o Each document has a relevance indication from feedback 


Employer Cambridge Women's Resources Centre Cambridge » X Jobtitle teaching assistant * X 
Full text bristol ▼ X Age 1980 to 1984 ▼ X 


implicit 

feedback 



□ GARGIAKINMULERO / Teaching Assistant / Bristol 
j-i ELISA Davis / Regional Operations Manager / COVENTPY 

O — 

BhupeshDas / Senior Accounts Clerk; Accounts Administrator / London 


explicit 


feedback 



^ .iiil w 




..ill 













Learning to rank 


• Machine Learning from user feedback 

• Input: set of {query, list of assessed documents} 

o Each document has a relevance label from feedback 





























































Learning to rank 


• Algorithm learns how to combine query &, document content 
to optimize ordering considering relevance labels 















































Learning to rank 


Output: model that gives a relevance score given a query 
and document 











Dynamic top K reranking 


























Dynamic top K reranking 


query 



























Typical features 


“In learning to rank, each query-document pair is represented by a 
multi-dimensional feature vector, and each dimension of the vector is a 
feature indicating how relevant or important the document is with respect 
to the query.” * 

Used in LTR papers: 1,2,3 

• TF-IDF, BM25, DFR, Language Model, cosine similarity, rank in other 
engines, etc. 

• Match-indicator between whole query &, whole document 


* "LETOR: A Benchmark Collection for Research on Learning to Rank for 
Information Retrieval", T. Qin, T. Liu, J. Xu, Jun and H. Li, 2010 

1 "Optimizing Search Engines using Clickthrough Data", T. Joachims, 2003 

2 "AdaRank: A Boosting Algorithm for Information Retrieval", J. Xu and H. Li, 
2007 

3 "Multileave Gradient Descent for Fast Online Learning to Rank", A. Schuth, 
H. Oosterhuis, S. Whiteson and M. de Rijke, 2016 


Bag of words 


software engineer data mining java amsterdam english 


r 

job title: 
skill: 
location: 
^languages: 


software 

engineer 

python. 

java 



job title: 
skill: 


ore 


mining 


technician 


drilling, mining 


berlin 


location: java 

english, german j 


^languages: English, javanese J 


4 matches 


4 matches 























Split up in facet fields 

job title skill skill location language 

software engineer data mining java amsterdam english - 

r / \ 


job title: software engineer > 


job title: ore mining technician 

skill: python,java 


skill: drilling, mining 

location: berlin 


location: java 

^languages: english, german ) 


^ languages: english, javanese ) 

'i 




3 matches 


1 match 

























One feature per field 


job title 

skill 

skill 

location 

language 

software engineer 

data mining 

java 

amsterdam 

english 


job title: software engineer 
skill: 
location: 


python, [java 
berlin 


languages:|englishl german 


feature vector 


j obtitie 

1/1 

ski 11 

1/2 

location 

0 

language 

1/1 






















Dynamic top K reranking 





























Linear models 


Used in many papers: 
o seminal papers 1 , 

o papers about leveraging user preferences 2 

o papers about online learning / interleaving 3 

Also in e.g. documentation about Solr’s LTR contrib 
module 


1 “Optimizing Search Engines using Clickthrough Data”, T. Joachims, 2003 

2 “A contextual-bandit approach to personalized news article 
recommendation”, L. Li, W. Chu, J. Langford, and R. E. Schapire, 2010. 

3 “Balancing exploration and exploitation in listwise and pairwise online 
learning to rank for information retrieval”, K. Hofmann, S.Whiteson, M. de 
Rijke, 2013 


Linear models 


End up with weight vector you can 
multiply with feature vectors. 



w i 
W2 
w 3 


% 


fi f2 h 


• • • 


score 






Linear models 


End up with weight vector you can 
multiply with feature vectors. 



w i 
W2 
w 3 


w n 


score 














Tuning facet weights: limitations^ 


• Cannot consider interdependency of facet field dimensions 

• Cannot take into account the actual content of fields 
o only match indicators 



w i 
W2 
w 3 


w n 


score 














Objectives 


“If I search for a skill ‘Java’ I want the candidates that 
also have ‘Java’ in their Jobtitie field to be weighted 
higher” 

“Education will be a less important match, the more 
years of experience a candidate has” 

“We should weight location matches less when finding 
candidates in IT” 


Learning objectives 


Take into account facet field content 
Model facet field interdependencies 


Take into account facet field content 



Categorical feature 


Interval feature 


















Take into account facet field content 


Query-document match 
features 


Document features 
Query features 



Categorical: e.g. denoting 
job-class, skill etc. 

Interval: e.g. years of 
experience 


Model facet field interdependencies 


jobclass:IT was in document 



jobclass:Retail was not in 
document 


o.o 1.0 o.o o.o 0.0 0.5 o.o ... 


w i 
W2 
w 3 


w n 










Model facet field interdependencies 


Use nonlinear ranking model based on e.g. 

• Nonlinear neural networks 

• Nonlinear SVM 

• Decision trees 


Model facet field interdependencies 

Decision tree 

experience_years 



location_match jobclass_doc_Management 

.1 





Model facet field interdependencies 

Decision tree 



job_class_doc_IT >0 



1.0 


1.2 


0.3 


1.4 





Model facet field interdependencies 

• “We should weight location matches less when 
finding candidates in IT” 



job_class_doc_IT >0 



1.0 


1.2 


0.3 


1.4 





Model facet field interdependencies 

“If I search for a skill ‘Java’ I want the candidates that 
also have ‘Java’ in their Jobtitle field to be weighted 





Model facet field interdependencies 

• “If I search for a skill ‘Java’ I want the candidates that 
also have ‘Java’ in their Jobtitle field to be weighted 
higher” 



jobtitle_contains_word_from_skill > 0 




Model facet field interdependencies 

• “Education will be a less important match, the more 
years of experience a candidate has” 








Scores 


model type 

algorithm 

performance 

Linear 

Ridge regression 

NDCG +6% 

Decision tree 

LambdaMART 

NDCG +16% 

Decision tree 

Random Forests 

NDCG +22% 








Scores: risk vs. reward 


Number 

queries 


"baseline" vs "reranking-model" 



NDCG score 





















































Execution time 


• Applying reranking on top 100 

o index: 1,000,000 documents 
o model: 1000 trees, each max. 7 leaves 

• Original library:+22% 


Execution time 


Culprit: transformation from internal API object 
to ranking-library object 
(done for each query-document pair) 


/-\ 

feature 

extraction 
\_> 


doublet] features -> 

String features -> 

DataPoint { String relevance_label; 
String query_id; 

String description; 
float!] features) 



ranking 


model 






Execution time 


After refactoring model application 


( -'N 

feature 

extraction 
v_/ 


double^ features 


- > 

ranking 


model 


Avg. query execution time increase: +4% 






Next steps: Implicit user feedback gathering 


• Transform user actions to feedback signals 

o transformation model may differ per customer 

• Avoid modeling an action loop 

o ...unless you want to optimize an action 
o validate with human-made assessments 

• Avoid modeling a reinforcing feedback loop 
o deal with position / selection bias 


Implicit feedback gathering 


NDCG 


Fold Max Classifier Train Test 



find classification(s) with maximum NDCG 


































Implicit feedback gathering 


NDCG 



extracted 

click 

features 


ZL 

click-to-signal 

classification 


explicit feedback 
assessments 


validate 


train folds 


may differ per customer 


Fold 

Max Classifier 

Train 

Test 

0 

Dwell=100s 

0.41 

0.37 

1 

Dwell=100s 

0.42 

0.28 

2 

Dwell=100s 

0.41 

0.37 

3 

Dwell=100s 

0.41 

0.42 

4 

Dwell=100s 

0.40 

0.51 

5 

Dwell=100s 

0.41 

0.37 

6 

Dwell=50s 

0.40 

0.42 

7 

Dwell=50s 

0.41 

0.38 

8 

Dwell=50s 

0.42 

0.29 

9 

Dwell=50s 

0.40 

0.45 



0.41 

0.39 


test fold 


find classification(s) with maximum NDCG 




































Conclusions 


• Faceted search can be really improved by LTR 
o With minimal impact on execution times 

• By determining your general learning objectives 

o Selecting features and algorithm accordingly and 
in harmony 

• Ranking models aren’t static 

o Differ in performance per query type / user 


Thanks! 


Any questions? 


contact 
join us: 


vanbelle(a>textkernel.nl 
textkernel.careers 


